9.3 Moments of Distributions

109

Table 9.1 Values of the

functionupper F left parenthesis r 1 comma r 2 right parenthesisF(r1,r2)

StartAbsoluteValue r 1 minus r 2 EndAbsoluteValue| r1r2 |

upper F left parenthesis r 1 comma r 2 right parenthesisF(r1,r2)

greater than 1>1

0

11

1

00

2

Here, we shall only derive the distribution of runs of two kinds of elements. More

complicated results may be found by reference to Mood’s paper (1940).

Let the two kinds of elements beaa andbb (they could be purines and pyrimidines),

and let there ben 1n1 aas andn 2n2 bbs, withn 1 plus n 2 equals nn1 + n2 = n.r Subscript 1 ir1i will denote the number of runs

ofaa of length ii, withsigma summation Underscript i Endscripts r Subscript 1 i Baseline equals r 1E

i r1i = r1, and so on. It follows that sigma summation i r Subscript 1 i Baseline equals n 1E ir1i = n1, and so on.

Given a set ofaas andbbs, the number of different arrangements of the runs ofaa andbb

are given by multinomial coefficients and the total number of ways of obtaining the

set r Subscript j i Baseline left parenthesis j equals 1 comma 2 semicolon i equals 1 comma 2 comma ellipsis comma n 1 right parenthesisr ji ( j = 1, 2; i = 1, 2, . . . , n1) is

upper N left parenthesis r Subscript j i Baseline right parenthesis equals StartBinomialOrMatrix r 1 Choose r Subscript 1 i Baseline EndBinomialOrMatrix StartBinomialOrMatrix r 2 Choose r Subscript 2 i Baseline EndBinomialOrMatrix upper F left parenthesis r 1 comma r 2 right parenthesis commaN(r ji) =

| r1

r1i

| | r2

r2i

|

F(r1,r2) ,

(9.40)

where the special function upper F left parenthesis r 1 comma r 2 right parenthesisF(r1,r2) is the number of ways of arranging r 1r1 objects

of one kind andr 2r2 objects of another so that no two adjacent objects are of the same

kind (see Table 9.1).

Since there areStartBinomialOrMatrix n Choose n 1 EndBinomialOrMatrix

( n

n1

)

possible arrangements of theaas andbbs, the distribution of the

r Subscript j ir ji is

upper P left parenthesis r Subscript j i Baseline right parenthesis equals StartFraction upper N left parenthesis r Subscript j i Baseline right parenthesis upper F left parenthesis r 1 comma r 2 right parenthesis Over StartBinomialOrMatrix n Choose n 1 EndBinomialOrMatrix EndFraction periodP(r ji) = N(r ji)F(r1,r2)

( n

n1

)

.

(9.41)

9.3.2

The Hypergeometric Distribution

Continuing the notation of the previous subsection, consider choosing rr elements

at random from the binary mixture of aas and bbs. What is the probability q Subscript kqk that the

group will contain exactly kk aas? It must necessarily contain r minus krk bbs, and the two

types of elements can be chosen in StartBinomialOrMatrix n 1 Choose k EndBinomialOrMatrix

(n1

k

)

and StartBinomialOrMatrix n minus n 1 Choose r minus k EndBinomialOrMatrix

(nn1

rk

)

ways, respectively. Since any

choice of kk aas can be combined with any choice of r minus krk bbs,

q Subscript k Baseline equals StartFraction StartBinomialOrMatrix n 1 Choose k EndBinomialOrMatrix StartBinomialOrMatrix n minus n 1 Choose r minus k EndBinomialOrMatrix Over StartBinomialOrMatrix n Choose r EndBinomialOrMatrix EndFraction periodqk =

(n1

k

)(nn1

rk

)

(n

r

)

.

(9.42)

This system of probabilities is called the hypergeometric distribution (because the

generating function ofq Subscript kqk is expressible in terms of hypergeometric functions). Many

combinatorial problems can be reduced to this form.